---
output: 
  bookdown::pdf_document2:
    fig_caption: yes
    keep_tex: yes
    keep_md: yes
    toc: no
    number_sections: yes
    latex_engine: xelatex
    pandoc_args: --lua-filter=multibib.lua
title: Macrointerest Across Countries
date: "`r format(Sys.time(), '%B %d, %Y')`"
editor_options: 
  markdown: 
    wrap: sentence
  chunk_output_type: console
tables: true # enable longtable and booktabs
citation_package: natbib
citeproc: false
fontsize: 12pt
indent: true
linestretch: 1.5 # double spacing using linestretch 1.5
bibliography:
  text: dcpo-macrointerest.bib
  app: dcpo-macrointerest-app.bib
biblio-style: apsr
citecolor: black
linkcolor: black
endnote: no
header-includes:
      - \usepackage{array}
      - \usepackage{caption}
      - \usepackage{float}
      - \usepackage{placeins}
      - \usepackage{graphicx}
      - \usepackage{siunitx}
      - \usepackage{colortbl}
      - \usepackage{xcolor}
      - \usepackage{multirow}
      - \usepackage{hhline}
      - \usepackage{calc}
      - \usepackage{tabularx}
      - \usepackage{threeparttable}
      - \usepackage{wrapfig}
      - \usepackage{fullpage}
      - \usepackage{lscape} #\usepackage{lscape} better for printing, page displayed vertically, content in landscape mode, \usepackage{pdflscape} better for screen, page displayed horizontally, content in landscape mode
      - \newcommand{\blandscape}{\begin{landscape}}
      - \newcommand{\elandscape}{\end{landscape}}
      - \usepackage{titlesec}
      - \titleformat*{\section}{\normalsize\bfseries}
      - \titleformat*{\subsection}{\normalsize\itshape}
      - \usepackage{titling} #use \maketitle repeatedly  
      - \renewcommand{\topfraction}{.85} # Adjust LaTeX placement rules per
      - \renewcommand{\textfraction}{.15}
      - \renewcommand{\floatpagefraction}{.66}
      - \setcounter{topnumber}{3}
      - \setcounter{bottomnumber}{3}
      - \setcounter{totalnumber}{4}
      - \usepackage{tabularray}
      - \usepackage{siunitx}
      - \renewcommand{\bottomfraction}{.80} # https://bookdown.org/yihui/rmarkdown-cookbook/figure-placement.html
---

\pagenumbering{gobble}

# Authors {.unnumbered}

Yue Hu, ORCID: <https://orcid.org/0000-0002-2829-3971>, Associate Professor, Department of Political Science, Tsinghua University, [yuehu\@tsinghua.edu.cn](mailto:yuehu@tsinghua.edu.cn){.email}

\vspace{1cm}

\noindent Frederick Solt, ORCID: <https://orcid.org/0000-0002-3154-6132>, Associate Professor, Department of Political Science, University of Iowa, [frederick-solt\@uiowa.edu](mailto:frederick-solt@uiowa.edu){.email}

\pagebreak

```{=tex}
\renewcommand{\baselinestretch}{1}
\selectfont
\maketitle
\renewcommand{\baselinestretch}{1.5}
\selectfont
```

```{=tex}
\begin{abstract}
The extent to which the public takes an interest in politics has long been argued to be foundational to democracy, but the want of appropriate data has prevented cross-national and longitudinal analysis. This letter takes advantage of recent advances in latent-variable modeling of aggregate survey responses and a comprehensive collection of survey data to generate dynamic comparative estimates of macrointerest, that is, aggregate political interest, for over a hundred countries over the past four decades. These macrointerest scores are validated with other aggregate measures of political interest and of other types of political engagement. A cross-national and longitudinal analysis of macrointerest in advanced democracies reveals that along with election campaigns and inclusive institutions, it is good economic conditions, not bad times, that spur publics to greater interest in politics. 
\end{abstract}
```
\pagebreak

\pagenumbering{arabic}

```{r setup, include=FALSE}
options(tinytex.verbose = TRUE)

knitr::opts_chunk$set(
  echo = FALSE,
  message = FALSE,
  warning = FALSE,
  dpi = 600,
  cache = TRUE,
  fig.width = 7,
  fig.height = 4,
  plot = function(x, options) {
    hook_plot_tex(x, options)
  }
)


# Load necessary package
if (!requireNamespace("usethis", quietly = TRUE)) install.packages("usethis")

# Function to check GitHub credentials
check_github_credentials <- function() {
  # Check if GitHub credentials are set
  github_token <- Sys.getenv("GITHUB_PAT")
  
  if (nzchar(github_token)) {
    return(TRUE)
  } else {
    message("Warning: GitHub credentials are not set. Please follow the instructions below to configure your GitHub token:")
    message("1. Create a GitHub Personal Access Token (PAT): https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token")
    message("2. Set up your token securely in R: https://usethis.r-lib.org/reference/edit_r_environ.html")
    message("3. After creating the PAT, add the following line to your .Renviron file: GITHUB_PAT=your_personal_access_token")
    message("4. Restart R for the changes to take effect.")
    return(FALSE)
  }
}

# Check credentials and install package if set
if (check_github_credentials()) message("Package installation aborted. Please set your GitHub credentials as instructed above.")

# If `DCPOtools` is not yet installed:

if (!require(gesisdata))
  remotes::install_github("fsolt/gesisdata")

if (!require(psych)) install.packages("psych")

if (!require(cmdstanr)) install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))

if (!require(DCPOtools))  remotes::install_github("fsolt/DCPOtools")

# if (!require(tabulizer)) devtools::install_github("ropensci/tabulizer", verbose = TRUE)

if(!require(rstan)) install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies = TRUE)

if(!require(mvtnorm)) install.packages("mvtnorm", dependencies = TRUE)

if(!require(brms)) install.packages("brms", dependencies = TRUE)

if (!require(pacman)) install.packages("pacman")

library(pacman)
# load all the packages you will use below 
p_load(
  DCPOtools,
  cmdstanr,
  tidyverse,
  here,
  rio,
  countrycode,
  patchwork,
  ggthemes,
  ggdist,
  imputeTS,
  rsdmx,
  osfr,
  tabulapdf,
  brms,
  bayestestR,
  tidybayes,
  repmis,
  rvest,
  vroom,
  tinytable,
  kableExtra,
  modelsummary
) 
```

```{r loadData}
load(here("data", "theta_summary.rda"))
load(here("data", "theta_results.rda")) # theta_summary
load(here("data", "theta_summary.rda"))
load(here("data", "theta_results.rda")) #internalval
load(here("data", "theta_summary.rda")) # ts
load(here("data", "results.rda")) # analysis

```


```{r define_funs}
# define functions
validation_plot <- function(v_data_raw,
                            lab_x = .38, lab_y = 92,
                            theta_summary, theta_results) {
    
    # defaults per https://stackoverflow.com/a/49167744/2620381
    if ("theta_summary" %in% ls(envir = .GlobalEnv) & missing(theta_summary))
        theta_summary <- get("theta_summary", envir = .GlobalEnv)
    if ("theta_results" %in% ls(envir = .GlobalEnv) & missing(theta_results))
        theta_results <- get("theta_results", envir = .GlobalEnv)

    median_val <- Vectorize(function(x) median(1:x),
                            vectorize.args = "x")
    
    v_vars <- v_data_raw %>% 
      select(item0 = item,
             title = title) %>% 
      distinct() %>% 
      mutate(v_val = str_extract(item0, "\\d+") %>% 
               as.numeric() %>% 
               median_val(.) %>%
               `+`(if_else(str_detect(item0, "disc3"), .1,
                           if_else(str_detect(item0, "int5_allbus"),
                                   0,
                                   .6))) %>% 
               round())
    
    validation_summarized <- v_data_raw %>% 
      DCPOtools::format_dcpo(scale_q = v_vars$item0[[1]], # these arguments are required
                             scale_cp = 1) %>% # but they don't matter
      pluck("data") %>% 
      mutate(item0 = str_remove(item, " \\d or higher")) %>% 
      # right_join(v_vars, by = "item0") %>%
      right_join(v_vars) %>%
      arrange(title) %>% 
      mutate(title = factor(title, 
                            levels = v_data_raw %>%
                              pull(title) %>%
                              unique())) %>% 
      filter(str_detect(item, paste(v_val, "or higher"))) %>%
      mutate(iso2c = countrycode::countrycode(country,
                                              origin = "country.name",
                                              destination = "iso2c",
                                              warn = FALSE),
             prop = y_r/n_r,
             se = sqrt((prop*(1-prop))/n),
             prop_90 = prop + qnorm(.9)*se,
             prop_10 = prop - qnorm(.9)*se) %>%
      inner_join(theta_summary %>% select(-kk, -tt), by = c("country", "year"))
    
    validation_cor <- theta_results %>%
      inner_join(validation_summarized %>%
                   select(country, year, title, prop, se),
                 by = c("country", "year")) %>% 
      rowwise() %>% 
      mutate(sim = rnorm(1, mean = prop, sd = se)) %>% 
      ungroup() %>% 
      select(title, theta, sim, draw) %>% 
      nest(data = c(theta, sim)) %>% 
      mutate(r = lapply(data, function(df) cor(df)[2,1]) %>% 
               unlist()) %>%
      select(-data) %>% 
      group_by(title) %>% 
      summarize(r = paste("R =", round(mean(r), 2)))

    if ({validation_summarized %>%
        pull(country) %>%
        unique() %>% 
        length()} > 1) {
      val_plot <- validation_summarized %>%
        ggplot(aes(x = mean,
                   y = prop * 100)) +
        geom_segment(aes(x = q10, xend = q90,
                         y = prop * 100, yend = prop * 100),
                     na.rm = TRUE,
                     alpha = .2) +
        geom_segment(aes(x = mean, xend = mean,
                         y = prop_90 * 100, yend = prop_10 * 100),
                     na.rm = TRUE,
                     alpha = .2) +
        geom_smooth(method = 'lm', formula = 'y ~ x', se = FALSE) +
        facet_wrap(~ title, ncol = 4) +
        geom_label(data = validation_cor, aes(x = lab_x,
                                              y = lab_y,
                                              label = r),
                   size = 2)
    } else {
      val_plot <- validation_summarized %>%
        ggplot(aes(x = year,
                   y = mean)) +
        geom_line() +
        geom_ribbon(aes(ymin = q10,
                        ymax = q90,
                        linetype = NA),
                    alpha = .2) +
        geom_point(aes(y = prop),
                   fill = "black",
                   shape = 21,
                   size = .5,
                   na.rm = TRUE) +
        geom_path(aes(y = prop),
                  linetype = 3,
                  na.rm = TRUE,
                  alpha = .7) +
        geom_segment(aes(x = year, xend = year,
                         y = prop_90, yend = prop_10),
                     na.rm = TRUE,
                     alpha = .2) +
        facet_wrap(~ title, ncol = 4) +
        geom_label(data = validation_cor, aes(x = lab_x,
                                              y = lab_y,
                                              label = r),
                   size = 2)
    }
    
    return(val_plot)
}

covered_share_of_spanned <- function(dcpo_input_raw) {
  n_cy <- dcpo_input_raw %>%
    distinct(country, year) %>% 
    nrow()
  
  spanned_cy <- dcpo_input_raw %>% 
    group_by(country) %>% 
    summarize(years = max(year) - min(year) + 1) %>% 
    summarize(n = sum(years)) %>% 
    pull(n)
  
  {(n_cy/spanned_cy) * 100}
}

get_coef <- function(iv, 
                     results_df = coef_data,
                     type = "both",
                     width = .95,
                     abs = FALSE) {
  result_var <- results_df %>% 
    filter(.width == width) %>% 
    pull(.variable) %>% 
    str_subset(iv)
  
  if (!type=="both") {
    res <- results_df %>% 
      filter(.variable == result_var & .width == width) %>% 
      pull({{type}})
    
    if (abs) {
      res <- as.character(res) %>% 
        str_remove_all("-")
    }
  } else {
    sc <- results_df %>% 
      filter(.variable == result_var & .width == width) %>% 
      pull(std_coef)
    
    ci <- results_df %>% 
      filter(.variable == result_var & .width == width) %>% 
      pull(ci)
    
    if (abs) {
      sc <- str_remove_all(sc, "-")
      ci <- str_remove_all(ci, "-") %>% 
        str_replace("(\\d+\\.?\\d?) to (\\d+\\.?\\d?)", "\\2 to \\1")
    }
    
    res <- paste0(sc, " (95% c.i.: ", ci, ")")
  }
  
  return(res)
}

by2sd <- function(var) {
  dich <- stats::na.omit(unique(var)) %>% 
    sort() %>% identical(c(0, 1))
  if (dich) 
    sd <- 1
  else 
    sd <- 2 * stats::sd(var, na.rm = TRUE)
  
  return(sd)
}

set.seed(324)
```


The public's interest in politics has long been argued to be fundamental to democracy, the foundation for the widespread civic engagement needed to hold elected officials accountable to citizen demands [see, e.g., @Almond1963]. 
<!-- Recent research highlights the substantial influence of political interests in shaping how people process political information, develop attitudes, and respond to political inquiries [@MillerEtAl2023].  -->
More than just boosting engagement, political interest critically determines the quality of political decisions and behaviors, influencing factors like time spent, information collection and utilization, and critical assessment of partisan claims [see, e.g., @LaneEtAl2022].
In light of the growing threats to democracy seen in many countries, measuring the levels and trends of aggregate political interest---macrointerest---and understanding their sources is therefore crucially important [see, e.g., @Foa2016, 10-11]. 

A recent contribution, @Peterson2022, measures macrointerest over time in the United States, but similar data allowing for large-scale cross-sectional time-series assessments have as yet been unavailable.
Although many surveys ask respondents across countries how interested they are in politics, differences in question wording and in response categories have limited scholars' ability to pool the data together, and even in the absence of these issues, in most countries the questions have not been asked sufficiently frequently to provide annual time series.

This letter takes advantage of recent advances in latent-variable modeling of cross-national aggregate survey responses and a comprehensive collection of survey data to generate dynamic comparative estimates of aggregate political interest for over a hundred countries over the past four decades.
It shows that these cross-national macrointerest scores perform well in validation tests.
Finally, as a demonstration of their utility, the letter presents a new test of theories on the circumstances that induce the publics of advanced democracies to take more interest in politics.
The results support arguments that, in these countries, election campaigns, inclusive institutions, and good economic conditions, not bad times, spur greater political interest.

```{r dcpo_input_raw, eval=FALSE}
surveys_interest <- read_csv(here::here("data-raw",
                                        "surveys_interest.csv"),
                             col_types = "cccccc")

dcpo_input_raw <- dcpo_setup(vars = surveys_interest,
                             datapath = here("..",
                                             "data",
                                             "dcpo_surveys"),
                             file = here("data",
                                         "dcpo_input_raw.csv"))
```

```{r tb_summary_stats}
surveys_interest <- read_csv(here("data-raw",
                                        "surveys_interest.csv"),
                             col_types = "cccccc")

dcpo_input_raw <- read_csv(here("data", "dcpo_input_raw.csv"),
                                  col_types = "cdcddcd")

process_dcpo_input_raw <- function(dcpo_input_raw_df) {
  dcpo_input_raw_df %>% 
    mutate(item = if_else(item == "int5_polit" &
                            (year < 1990 | year == 1991),
                          "int3_polit",
                          item),
           r = if_else(item == "int3_polit",
                       r - 2,
                       r)) %>% 
    filter(year >= 1982 & n > 0 & !country=="Northern Ireland") %>% 
    with_min_yrs(3) %>% 
    with_min_cy(5) %>% 
    with_min_yrs(3) %>%
    group_by(country) %>% 
    mutate(cc_rank = n()) %>% 
    ungroup() %>% 
    arrange(-cc_rank)
}

dcpo_input_raw1 <- dcpo_input_raw %>% 
  process_dcpo_input_raw()

n_surveys <- surveys_interest %>% 
  distinct(survey) %>% 
  nrow()

n_items <- dcpo_input_raw1 %>%
  distinct(item) %>% 
  nrow()

n_countries <- dcpo_input_raw1 %>%
  distinct(country) %>% 
  nrow()

n_cy <- dcpo_input_raw1 %>%
  distinct(country, year) %>% 
  nrow() %>% 
  scales::comma()

n_years <- as.integer(summary(dcpo_input_raw1$year)[6]-summary(dcpo_input_raw1$year)[1])

spanned_cy <- dcpo_input_raw1 %>% 
  group_by(country) %>% 
  summarize(years = max(year) - min(year) + 1) %>% 
  summarize(n = sum(years)) %>% 
  pull(n) %>% 
  scales::comma()

total_cy <- {n_countries * n_years} %>% 
  scales::comma()

year_range <- paste("from",
                    summary(dcpo_input_raw1$year)[1],
                    "to",
                    summary(dcpo_input_raw1$year)[6])

n_cyi <- dcpo_input_raw1 %>% 
  distinct(country, year, item) %>% 
  nrow() %>% 
  scales::comma()

back_to_numeric <- function(string_number) {
  string_number %>% 
    str_replace(",", "") %>% 
    as.numeric()
}

covered_share <- covered_share_of_spanned(dcpo_input_raw1)
```


# Cross-National Macrointerest: The Source Data {.unnumbered}

National and cross-national surveys have asked questions on political interest often over the past four decades, but the resulting data are both sparse, that is, unavailable for many countries and years, and incomparable, generated by many different survey items.
In all, `r n_items` such survey items were asked in no fewer than five country-years in countries surveyed at least twice; these items were drawn from `r n_surveys` different survey datasets (see online Appendix\nobreakspace{}\@ref(surveys)).

Together, the survey items in the source data were asked in `r n_countries` different countries in at least three time points over the `r n_years` years `r year_range`, yielding a total of `r n_cyi` country-year-item observations.
Observations for every year in each country surveyed would number `r total_cy`, and a complete set of country-year-items would encompass `r {n_countries * n_years * n_items} %>% scales::comma()` observations.
Compared to this hypothetical complete set of country-year-items, the available data are very, very sparse.
More optimistically, there are `r n_cy` country-years in which there is at least _some_ information about the public's interest in politics, that is, some `r round(covered_share)`% of the `r spanned_cy` country-years spanned by these data.
Still, the multitude of different survey items makes these data incomparable and difficult to use together.

```{r itemcountry, fig.cap="Countries and Years with the Most Observations in the Source Data", fig.height=3.5, fig.pos='h', cache=FALSE}
items_plot <- dcpo_input_raw1 %>%
  distinct(country, year, item) %>%
  count(item) %>%
  arrange(desc(n)) %>% 
  # head(12) %>% 
  ggplot(aes(forcats::fct_reorder(item, n, .desc = TRUE), n)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Country-Years\nObserved") +
  ggtitle("Items")

int4_ess_cy <- dcpo_input_raw1 %>% 
  filter(item == "int4_ess") %>%
  distinct(country, year) %>%
  nrow()

int4_ess_surveys <- dcpo_input_raw1 %>%
  filter(item == "int4_ess") %>%
  distinct(survey) %>%
  pull(survey) %>% 
  str_split(", ") %>% 
  unlist() %>% 
  unique() %>% 
  sort()


countries_plot <- dcpo_input_raw1 %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country)) %>% 
  distinct(country, year, item) %>% 
  count(country) %>%
  arrange(desc(n)) %>% 
  head(12) %>% 
  ggplot(aes(forcats::fct_reorder(country, n, .desc = TRUE), n)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Year-Items\nObserved") +
  ggtitle("Countries")

cby_plot <- dcpo_input_raw1 %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country),
         country = stringr::str_replace(country, "South", "S.")) %>% 
  distinct(country, year) %>%
  count(country) %>% 
  arrange(desc(n)) %>% 
  head(12) %>% 
  ggplot(aes(forcats::fct_reorder(country, n, .desc = TRUE), n)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Years\nObserved") +
  ggtitle("Countries")


ybc_plot <- dcpo_input_raw1 %>%
  distinct(country, year) %>%
  count(year, name = "nn") %>%
  ggplot(aes(year, nn)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        # axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Countries\nObserved") +
  ggtitle("Years")

de_obs <- dcpo_input_raw1 %>% 
  distinct(country, year, item) %>%
  count(country) %>%
  filter(country == "Germany") %>%
  pull(n)

others <- dcpo_input_raw1 %>%
  distinct(country, year, item) %>%
  count(country) %>%
  arrange(desc(n)) %>%
  slice(2:5) %>%
  pull(country) %>% 
  knitr::combine_words() %>% 
  str_replace_all("(United|Netherlands)", "the \\1")

countries_cp <- dcpo_input_raw1 %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country),
         country = stringr::str_replace(country, "South", "S.")) %>% 
  distinct(country, year, item) %>%
  count(country) %>% 
  arrange(desc(n)) %>% 
  head(12) %>% 
  pull(country)

countries_cbyp <- dcpo_input_raw1 %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country),
         country = stringr::str_replace(country, "South", "S.")) %>% 
  distinct(country, year) %>%
  count(country) %>% 
  arrange(desc(n)) %>% 
  head(12) %>% 
  pull(country)

adding <- setdiff(countries_cbyp, countries_cp) %>% 
  knitr::combine_words()

dropping <- setdiff(countries_cp, countries_cbyp) %>% 
  knitr::combine_words()

y_peak_year <- dcpo_input_raw1 %>%
  distinct(country, year) %>%
  count(year, name = "nn") %>% 
  filter(nn == max(nn)) %>% 
  pull(year)

y_peak_nn <- dcpo_input_raw1 %>%
  distinct(country, year) %>%
  count(year, name = "nn") %>% 
  filter(nn == max(nn)) %>% 
  pull(nn)

data_poorest <- dcpo_input_raw1 %>%
  distinct(country, year, item) %>%
  count(country) %>%
  arrange(n) %>%
  filter(n == 3) %>%
  pull(country) %>% 
  knitr::combine_words() %>% 
  paste0("---", ., "---")

wordify_numeral <- function(x) setNames(c("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", " seventeen", "eighteen", "nineteen"), 1:19)[x]

n_data_poor <- {data_poorest %>%
    str_split(",") %>% 
    first()} %>% 
  length() 

if(n_data_poor < 20) {
  n_data_poorest <- n_data_poor %>% 
    wordify_numeral()
} else {
  n_data_poorest <- n_data_poor
  data_poorest <- " "
}

(countries_plot + cby_plot) / (ybc_plot)
```

In the top left panel of Figure\nobreakspace{}\@ref(fig:itemcountry), the twelve countries with the most country-year-item observations are displayed.
Germany, with `r de_obs` observations, is the best represented country in these source data, followed by `r others`.
At the other end of the scale, there are `r n_data_poorest` countries`r data_poorest`that have only the bare minimum three observations needed to be included in the source dataset at all.
In the top right panel are the dozen countries with the most observed years; this group is similar to that on the left, but with `r adding` adding to the list and `r dropping` dropping off.
The bottom panel shows the number of countries observed in each year.
Coverage across countries reached its apex in `r y_peak_year`, when respondents in `r y_peak_nn` countries were asked at least one item about their interest in politics.
The next section describes how this sparse and incomparable survey data was used together with a latent variable model to generate complete time series of macrointerest scores that are comparable across countries.


# Estimating Cross-National Macrointerest {.unnumbered}

Several recent studies have developed latent variable models of aggregate survey responses based on cross-national survey data [see @Claassen2019; @Caughey2019; @McGann2019; @Kolczynska2020].
To estimate the public's interest in politics across countries and over time, this work employs the latest of these methods that is appropriate for data that are both incomparable and sparse, the Dynamic Comparative Public Opinion (DCPO) model elaborated in @Solt2020c.
@Solt2020c demonstrates that the DCPO model provides a better fit to survey data than the models put forward by @Claassen2019 or @Caughey2019.
The @McGann2019 model depends on dense survey data unlike the sparse data on interest in politics described in the preceding section.
@Kolczynska2020 is the very most recent of these five works and builds on each of the others, but the MRP approach developed in that piece is suitable not only when the available survey data are dense but also when ancillary data on population characteristics are available, so it is similarly inappropriate to this application.
The dyad ratio algorithm employed in @Peterson2022, of course, leverages only over time variation within a single country and not variation across countries, making it a poor choice for generating cross-national estimates [see @Caughey2019, p. 686].^[
Appendix\nobreakspace{}\@ref(peterson) compares our estimates for the United States with the estimates presented in @Peterson2022.]
The DCPO model is a population-level two-parameter ordinal logistic item response theory model with country-specific item-bias terms.
For a comprehensive description of the DCPO model, see Appendix\nobreakspace{}\@ref(dcpo) and @Solt2020c [, 3-8]; the focus here is on how it deals with the two principal issues raised by the source data, incomparability and sparsity.

The DCPO model accounts for incomparability using three sets of parameters.
First, it incorporates the _difficulty_ of each question's responses, that is, how much interest in politics is indicated by a given response. 
This is most evident with respect to response categories: to say that one is "very interested" in politics, for example, is to exhibit more interest than to say that one is "somewhat interested" or "not very interested."
Here, difficulty is permitted to vary with question wording and the survey project as well.
Second, the DCPO model accounts for each question's _dispersion_, its noisiness with regard to our latent trait.
The lower the dispersion, the better that changes in responses to the question map onto changes in macrointerest.
Third, to provide for the possibility that translation issues or cultural differences result in the same question being interpreted differently in different countries, the model estimates _country-specific bias_ parameters that shift the difficulty of all responses for a particular question in a particular country.
Together, the model's difficulty, dispersion, and country-specific bias parameters work to generate comparable estimates of the latent variable of macrointerest from the available but incomparable source data.^[
For how other data issues, such as sample representation, may affect the estimated outcome, see Appendix\nobreakspace{}\@ref(dcpo).]

To address sparsity in the source data---unpolled or thinly surveyed years in each country---DCPO uses simple local-level dynamic linear models, i.e., random-walk priors, for each country.
That is, within each country, each year's value of macrointerest is modeled as the previous year's estimate plus a random shock.
These dynamic models smooth the estimates of macrointerest over time and allow estimation even in years for which little or no survey data is available, albeit at the expense of greater measurement uncertainty.

```{r dcpo_input, eval=FALSE, include=FALSE, results=FALSE}
dcpo_input <- DCPOtools::format_dcpo(dcpo_input_raw1,
                                     scale_q = "int4_ess",
                                     scale_cp = 2)
save(dcpo_input, file = here::here("data", "dcpo_input.rda"))
```

```{r dcpo, eval=FALSE, include=FALSE, results=FALSE}
iter <- 1000

dcpo <- paste0(.libPaths(), "/DCPO/stan/dcpo.stan")[1] |> 
  cmdstan_model()

dcpo_output <- dcpo$sample(
   data = dcpo_input[1:13], 
   max_treedepth = 14,
   adapt_delta = 0.99,
   step_size = 0.005,
   seed = 324, 
   chains = 4, 
   parallel_chains = 4,
   iter_warmup = iter/2,
   iter_sampling = iter/2,
   refresh = iter/50
 )

results_path <- here::here(file.path("data", 
                                     iter, 
                                  {str_replace_all(Sys.time(),
                                                      "[- :]",
                                                   "") %>%
                                         str_replace("\\.\\d*$",
                                                     "")}))

dir.create(results_path, 
           showWarnings = FALSE, 
           recursive = TRUE)

dcpo_output$save_data_file(dir = results_path,
                           random = FALSE)
dcpo_output$save_output_files(dir = results_path,
                              random = FALSE)
```

```{r dcpo_output, eval=FALSE}
if (!exists("results_path")) {
  latest <- "20240129125552"
  results_path <- here::here("data", "1000", latest)

  # Define OSF_PAT in .Renviron: https://docs.ropensci.org/osfr/articles/auth
  if (!file.exists(file.path(results_path, paste0("dcpo-", str_extract(latest, "\\d{12}"), "-1.csv")))) {
    dir.create(results_path,
               showWarnings = FALSE,
               recursive = TRUE)
    osf_retrieve_node("9uvdk") %>%
      osf_ls_files() %>%
      filter(str_detect(name, str_extract(latest, "\\d{12}"))) %>%
      osf_download(path = here::here("data", "1000"))
  }
  
  dcpo_output <- as_cmdstan_fit(here::here(results_path,
                                           list.files(results_path,
                                                      pattern = "csv$")))  
}

load(file = here::here("data", "dcpo_input.rda"))

theta_summary <- DCPOtools::summarize_dcpo_results(dcpo_input,
                                                   dcpo_output,
                                                   "theta")

save(theta_summary, file = here::here("data",
                                      "theta_summary.rda"))

theta_results <- extract_dcpo_results(dcpo_input,
                                      dcpo_output,
                                      par = "theta")

save(theta_results, file = here::here("data",
                                      "theta_results.rda"))

alpha_results <- summarize_dcpo_results(dcpo_input,
                                        dcpo_output = dcpo_output,
                                        pars = "alpha") %>% 
  transmute(item = question,
            dispersion = mean)

beta_results <- summarize_dcpo_results(dcpo_input,
                                       dcpo_output,
                                       "beta") %>% 
  group_by(question) %>% 
  summarize(difficulties0 = paste0(sprintf("%.2f", round(mean, 2)),
                                   collapse = ", ")) %>% 
  mutate(item = question,
         cp = if_else(str_detect(item, "threestate"),
                      2, 
                      as.numeric(str_extract(item, "\\d+")) - 1),
         term = str_glue("(( ?-?[0-9].[0-9][0-9]?,?){{{cp}}})"),
         difficulties = str_extract(difficulties0, 
                                    term) %>%
           str_replace(",$", "") %>% 
           str_trim()) %>% 
  transmute(item, difficulties)

save(alpha_results,
     beta_results,
     file = here::here("data",
                       "item_results.rda"))
```

```{r theta_summary}
res_cy <- nrow(theta_summary) %>% 
  scales::comma()

res_c <- theta_summary %>% 
  pull(country) %>% 
  unique() %>% 
  length()
```

The model was estimated using the `DCPOtools` package for R [@Solt2019], running four chains for 1,000 iterations each and discarding the first half as warmup, leaving 2,000 samples.
The $\hat{R}$ diagnostic had a maximum value of 1.01, indicating that the model converged. <!-- macrointerest Rhat confirmed -->
The dispersion parameters of the survey items indicate that all of them load well on the latent variable (see Appendix\nobreakspace{}\@ref(surveys)).
The result is estimates, in all `r res_cy` country-years spanned by the source data, of the mean political interest of the public, that is, macrointerest.


# Validating Cross-National Macrointerest {.unnumbered}

That we can generate estimates of macrointerest does not automatically mean that they are suitable for analysis. 
As is the case for any new measure, validation tests of cross-national latent variables are crucially important [see, e.g., @Hu2023].
Figures\nobreakspace{}\@ref(fig:internalval)\nobreakspace{} and\nobreakspace{}\@ref(fig:extval1) provide evidence of this measure's validity with tests of convergent validation and construct validation.
Convergent validation refers to tests of whether a measure is empirically associated with alternative indicators of the same concept [@Adcock2001, 540].
In Figure\nobreakspace{}\@ref(fig:internalval), the macrointerest scores are compared to responses to individual source-data survey items that were used to generate them; this provides an 'internal' convergent validation test [for an example in a similar context, see @Caughey2019, p. 686].

```{r internal_val_dat}
internal_tscs_dat <- dcpo_input_raw1 %>% 
  filter(item == "int4_ess") %>%  
  mutate(title = "European Social Survey",
         neg = FALSE)

internal_cs_dat <- dcpo_input_raw1 %>% 
  filter(survey == "issp2004") %>%  
  mutate(title = "ISSP Citizenship I, 2004",
         neg = FALSE)

internal_ts_dat <- dcpo_input_raw1 %>% 
  filter(item == "int5_allbus") %>%  
  mutate(title = "Germany",
         neg = FALSE)
```

```{r internalval, fig.height = 3.5, fig.cap = "Internal Convergent Validation: Correlations Between Macrointerest and Individual Source-Data Survey Items"}
internal_tscs_plot <- validation_plot(internal_tscs_dat,
                                lab_x = .15,
                                lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 8),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "Macrointerest",
       y = "% 'Very' or 'Quite' Interested in Politics")

internal_cs_plot <- validation_plot(internal_cs_dat,
                                lab_x = .15,
                                lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 8),
        # strip.text.x = element_text(size=5),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "Macrointerest",
       y = "% 'Very' or 'Fairly' Interested in Politics")

internal_ts_plot <- validation_plot(internal_ts_dat,
                                    lab_x = 1987,
                                    lab_y = .95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(ylim = c(0,1)) +
  labs(x = "Year",
       y = "Score") +
  annotate("text", x = 1990, y = .84, size = 2,
           label = 'ALLBUS') +
  annotate("text", x = 2000, y = .38, size = 2,
           label = "Macrointerest")

internal_tscs_plot + internal_cs_plot + internal_ts_plot  +
  patchwork::plot_annotation(caption = "Note: Gray whiskers and shading represent 80% credible intervals.")
```

```{r ext_val_dat, eval=FALSE}
ext_dat <- read_csv(here("data-raw",
                         "surveys_ext.csv"),
                    col_types = "cccccc") %>% 
  DCPOtools::dcpo_setup(datapath = here("..",
                                        "data",
                                        "dcpo_surveys"),
                        file = here("data",
                                    "ext_dat.csv"))
```

```{r ext_dat, include=FALSE}
ext_dat <- read_csv(here("data",
                         "ext_dat.csv"),
                    col_types = "cdcddcd")

ext_wvs_disc_dat <- ext_dat %>%
  filter(str_detect(item, "disc3")) %>% 
  mutate(title = "",
         neg = FALSE)

ext_wvs_news_dat <- ext_dat %>%
  filter(str_detect(item, "news5")) %>% 
  mutate(title = "",
         neg = FALSE)

ext_wvs_imp_dat <- ext_dat %>%
  filter(str_detect(item, "imp4")) %>%
    mutate(title = "",
           neg = FALSE)
```

```{r extval1, fig.cap="Construct Validation: Correlations Between Macrointerest and Other Aspects of Political Engagement", fig.height=3.5}
ext_wvs_news_plot <- validation_plot(ext_wvs_news_dat,
                                     lab_x = .15,
                                     lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "Macrointerest",
       y = "% Who Follow Politics in Traditional Media\nAt Least Several Times a Week")

ext_wvs_disc_plot <- validation_plot(ext_wvs_disc_dat,
                                    lab_x = .15,
                                    lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "Macrointerest",
       y = "% Who Discuss Politics with Friends")

ext_wvs_imp_plot <- validation_plot(ext_wvs_imp_dat,
                                       lab_x = .13,
                                       lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "Macrointerest",
       y = "% Saying Politics is 'Rather' or 'Very'\nImportant in Their Lives")

ext_wvs_news_plot + ext_wvs_disc_plot + ext_wvs_imp_plot +
  plot_annotation(
    caption = "Note: Gray whiskers and shading represent 80% credible intervals. Survey\nitems sourced from World Values Study and European Values Study.")
```

On the left, macrointerest scores are plotted against the percentage of respondents across all country-years who offered the two most interested responses on the European Social Survey's four-point item, "How interested are you in politics?"
The middle panel shows responses to the question with the most data-rich cross-section, "How interested would you say you personally are in politics?" in the ISSP's 2004 Citizenship module.
Finally, the right panel evaluates how well the macrointerest scores capture change over time by focusing on the item with the largest number of observations for a single country in the source data, which asked respondents to Germany's ALLBUS, "How interested in politics are you?"
In all three cases, the correlations, estimated taking into account the uncertainty in the measures, are strong.

Construct validation refers to demonstrating, for some _other_ concept believed causally related to the concept a measure seeks to represent, that the measure is empirically associated with measures of that other concept [@Adcock2001, 542].
Figure\nobreakspace{}\@ref(fig:extval1) depicts the relationships between macrointerest and three survey items from the WVS and EVS on other aspects of political engagement that are expected to have causal relationships with political interest [see @Kittilson2010, 995]: in the left panel, following political news on television, radio, and newspapers; in the center panel, discussing politics with friends; and on the right, feeling politics is important to one's life.
These relationships are all positive and are moderate to strong.
This cross-national latent variable of macrointerest performs well in validation tests.


# Testing Theories of Macrointerest Cross-Nationally {.unnumbered}
The best developed theories of macrointerest concern the advanced democracies, and even among these relatively similar countries, macrointerest varies greatly.
Figure\nobreakspace{}\@ref(fig:ts) examines levels and trends in macrointerest in advanced democratic countries by displaying the changes of the public's expressed interest in politics over time in the thirty-seven democracies of the OECD (Appendix \@ref(trajectory) presents these macrointerest data for all available countries).
While macrointerest scores approach and often exceed .6 in countries such as Denmark and Canada, in Chile they scarcely cross .25.
And although the public's political interest has held fairly steady over decades in many countries, in Czechia it dropped nearly half of the variable's entire theoretical range over the 1990s and 2000s before rebounding slightly since 2010, and increases of roughly a quarter of that range can be seen in, among others, Germany. 
There are considerable differences in the extent to which the public professes interest in politics both across countries and over time.

```{r ts, fig.cap="Macrointerest Scores Over Time Within OECD Democracies \\label{ts}", fig.height=9}
oecd_countries <- c("Australia", "Austria", "Belgium",
                    "Canada", "Chile", "Colombia",
                    "Costa Rica", "Czechia", "Denmark",
                    "Estonia", "Finland", "France", 
                    "Germany", "Greece", "Hungary",
                    "Iceland", "Ireland", "Israel",
                    "Italy", "Japan", "South Korea",
                    "Latvia", "Lithuania", "Luxembourg",
                    "Mexico", "Netherlands", "New Zealand",
                    "Norway", "Poland", "Portugal", 
                    "Slovakia", "Slovenia", "Spain",
                    "Sweden", "Switzerland", "Turkey", 
                    "United Kingdom", "United States")

c_res <- theta_summary %>% 
  filter(country %in% oecd_countries) %>%
  mutate(country = fct_reorder(country, mean, .desc = TRUE))

ggplot(data = c_res, aes(x = year, y = mean)) +
  theme_bw() +
  theme(legend.position = "none") +
  coord_cartesian(xlim = c(1982, 2022), ylim = c(0, 1)) +
  labs(x = NULL, y = "Macrointerest") +
  geom_ribbon(data = c_res, aes(ymin = q10, ymax = q90, linetype=NA), alpha = .25) +
  geom_line(data = c_res) +
  facet_wrap(~country, ncol = 5) +
  theme(axis.text.x  = element_text(size=7,
                                    angle = 90,
                                    vjust = .45,
                                    hjust = .95),
        strip.background = element_rect(fill = "white", colour = "white")) +
  plot_annotation(caption = "Note: Countries are ordered by their median macrointerest score; gray shading represents 80% credible intervals.")
```

What accounts for these differences?
The literature offers a range of arguments for how the political context may influence the public's interest in politics.
Perhaps the most straightforward is that publics grow more interested in politics at election time.
Campaigns and elections attract media coverage and increase the information available to the public on the issues being contested, leading to increased interest in politics [see, e.g., @Larsen2022].
Macrointerest within a country should be expected to be higher, therefore, in years in which national elections take place than in years without elections.

A second argument is that political institutions that share power, rather than concentrate it, yield politics that are more interesting and engaging.
Building on @Lijphart1999 and @Powell2000, @Kittilson2010 [, 992] argues that power-sharing institutions---parliamentarism, federalism, and proportional electoral rules---"send signals of inclusiveness to citizens, generating greater political engagement" while power-concentrating institutions "may generate perceptions of exclusion and deter involvement."
Macrointerest should be higher in countries with parliamentary and federal systems than in those without those features, and it should decline as the disproportionality between votes cast and seats won increases.

A third claim deals with the public's demand for accountability.
@Peterson2022 [, 203] argues "when there is information that something has gone wrong\ldots then voters should be more likely to attend to the actions of elected officials," but when "there is evidence of success\ldots voters should not waste their energies."
If democracy is a principal-agent problem with elected officials acting as self-interested agents and the public as their lazy but vengeful principal, then macrointerest should rise when times are bad and decline as conditions improve.

A final set of theories---each well established---contradicts the third.
Modernization theory holds that the public's interest will increase as the national economy grows and household incomes expand [see, e.g., @Inglehart2005].
Unemployment has long been argued to not to motivate but to depress political interest [see, e.g., @Rosenstone1982, 26].
And the relative power theory holds that greater income inequality, by increasingly concentrating political power in the hands of the wealthy, allows them greater power to shape the political agenda in ways that discourage the broader public from taking interest [see, e.g., @Solt2008].
In each of these circumstances, macrointerest is argued to increase in good, not bad, economic conditions [see also @Stimson2015; @Peterson2022, 206].

Data to test these hypotheses are drawn from several sources.
The Democratic Electoral Systems (DES) dataset updated in @Bormann2022 provides information about the timing of elections, yielding a dichotomous variable coded one in election years and zero when no election was held.
The three institutional variables are measured as in @Kittilson2010.
Data on parliamentarism, a dichotomous variable coded one in pure parliamentary systems and zero otherwise, is sourced from the DES.
Federalism is likewise dichotomous, coded one in countries with strong federal systems [see @Lijphart1999] and zero in all others.
Proportionality in the electoral system is measured using the Gallagher least-squares index of disproportionality, which measures the disparity between parties' vote shares and their seat shares [@Gallagher1991, 40-41; @Gallagher2023].
The context of good and bad economic conditions was measured with data on GDP per capita, national GDP growth, and unemployment from OECD.Stat [@OECD2023] and on the Gini index of disposable income inequality from the Standardized World Income Inequality Database [@Solt2020].

```{r des_download}
if (!file.exists(here("data-raw", "es_data-v41", "es_data-v4_1.csv"))) {
  dir.create(here("data-raw", "es_data-v41"))
  download.file("http://mattgolder.com/files/research/es_v4_codebook.pdf",
                here("data-raw", "es_data-v41", "es_v4_codebook.pdf"))
  download.file("http://mattgolder.com/files/research/es_data-v41.zip",
                here("data-raw", "es_data-v41", "es_data-v41.zip"))
  unzip(here("data-raw", "es_data-v41", "es_data-v41.zip"),
        exdir = here("data-raw", "es_data-v41"))
}
```

```{r des}
des_data <- vroom::vroom(here("data-raw",
                              "es_data-v41",
                              "es_data-v4_1.csv"),
                         guess_max = 5000,
                         show_col_types = FALSE) %>% 
  mutate(country = countrycode(country,
                               origin = "country.name",
                               destination = "country.name",
                               custom_match = c("Serbia & Montenegro" =
                                                  "Serbia",
                                                "The Co-operative Republic of Guyana" = "Guyana"))) %>% 
  filter(country %in% oecd_countries &
           country!="Turkey" &
           year > 1976) %>% 
  arrange(country, year) %>% 
  fill() %>% 
  group_by(country, year) %>% 
  summarize(regime = first(regime))
```

```{r disp}
if (!file.exists(here::here("data", "gallagher_data.rda"))) {
  if (!file.exists(here("data-raw", "ElectionIndices.pdf"))) {
    download.file("https://www.tcd.ie/Political_Science/about/people/michael_gallagher/ElSystems/Docts/ElectionIndices.pdf",
                  here("data-raw", "ElectionIndices.pdf"))
  }
  
  gallagher_uk <- extract_text(here("data-raw", "ElectionIndices.pdf"),
                               pages = 46,
                               area = list(c(69,
                                             86, 
                                             111, 
                                             424)))[[1]] %>% 
    as_tibble() %>% 
    separate_longer_delim(value, " \n") %>% 
    separate(value, into = c("year", "lsq", "n_v", "n_s", "seats"), sep = "\\s") %>% 
    mutate(country = "United Kingdom",
           year = as.numeric(year),
           lsq = as.numeric(lsq),
           n_v = as.numeric(n_v), 
           n_s = as.numeric(n_s),
           seats = as.numeric(seats)) %>% 
    filter(!is.na(year))
  
  if (!file.exists(here("data-raw", "Disproportionality.csv"))) {
    download.file("https://raw.githubusercontent.com/christophergandrud/Disproportionality_Data/master/Disproportionality.csv")
  }
  
  disp_add <- rio::import(here("data-raw", "Disproportionality.csv")) %>% 
    filter((country == "Korea, Republic of" & year < 2000) | (country == "Colombia")) %>% 
    select(country, year, lsq = disproportionality) %>% 
    mutate(country = countrycode(country, "country.name", "country.name"))
  
  gallagher0 <- map_df(5:48, ~ extract_tables(here("data-raw",
                                                   "ElectionIndices.pdf"),
                                              pages = c(.x))[[1]] %>% 
                         as_tibble())
  
  gallagher <- gallagher0 %>% 
    mutate(V1 = case_when(V1 == "Republic" ~ "Dominican Rep",
                          V1 == "Ireland LSq" ~ "Northern Ireland",
                          V1 == "Kingdom" ~ "United Kingdom",
                          V1 == "(House)" ~ "United States",
                          V1 == "Scotland" ~ "Scotland",
                          TRUE ~ V1),
           country = countrycode(V1, "country.name", "country.name",
                                 warn = FALSE),
           country = case_when(V1 == "elections" ~ "Ireland EP",
                               V1 == "college)" ~ "U.S. Electoral College",
                               V1 == "Ireland LSq" ~ "Northern Ireland",
                               V1 == "Wales" ~ "Wales",
                               V1 == "Principe" ~ "Principe",
                               TRUE ~ country),
           country2 = country) %>% 
    fill(country) %>% 
    filter(is.na(country2) & !V1 == "See Notes.") %>% 
    separate_wider_delim(V1, 
                         delim = " ", 
                         names = c("year", "info"),
                         too_few = "align_start",
                         too_many = "merge") %>% 
    mutate(lsq = str_replace_all(info, "[^\\d.]", ""),
           V2 = if_else(V2 == "", lsq, V2),
           month = str_extract(info, "[A-Z][a-z]{2}\\b") %>% 
             match(month.abb)) %>% 
    filter((is.na(info) | !str_detect(info, "PR|list|SMD|SMP")) ) %>% 
    transmute(country = country,
              year = as.numeric(year),
              lsq = as.numeric(V2),
              info = info,
              month = month) %>% 
    filter(!is.na(lsq)) %>% 
    bind_rows(gallagher_uk, disp_add) %>% 
    group_by(country, year) %>% 
    arrange(country, -year, -month) %>% 
    distinct(country, year, .keep_all = TRUE) %>% 
    arrange(country, year) %>% 
    select(country, year, lsq)
  
  rio::export(gallagher, here::here("data", "gallagher_data.rda"))
} else {
  gallagher <- rio::import(here::here("data", "gallagher_data.rda"))
}    
```

```{r oecd}
if (!file.exists(here::here("data-raw", "oecd_data.rda"))) {
  
  oecd_growth_link <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/SNA_TABLE1/AUS+AUT+BEL+CAN+CHL+COL+CRI+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA.B1_GE.G/all?startTime=1980&endTime=2023"
  
  oecd_growth <- oecd_growth_link %>% 
    readSDMX() %>% 
    as_tibble() %>% 
    transmute(country = countrycode(LOCATION, "iso3c", "country.name"),
              year = as.numeric(obsTime),
              growth = obsValue)
  
  oecd_unemployment_link <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/LFS_SEXAGE_I_R/AUS+AUT+BEL+CAN+CHL+COL+CRI+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA.MW.1564.UR.A/all?startTime=1980&endTime=2023"
  
  oecd_unemp <- oecd_unemployment_link %>% 
    readSDMX() %>% 
    as_tibble() %>% 
    transmute(country = countrycode(COUNTRY, "iso3c", "country.name"),
              year = as.numeric(obsTime),
              unemployment = obsValue)
  
  oecd_inflation_link <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/PRICES_CPI/AUS+AUT+BEL+CAN+CHL+COL+CRI+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA.CPALTT01.GY.A/all?startTime=1980&endTime=2023"
  
  oecd_inf <- oecd_inflation_link %>% 
    readSDMX() %>% 
    as_tibble() %>% 
    transmute(country = countrycode(LOCATION, "iso3c", "country.name"),
              year = as.numeric(obsTime),
              inflation = obsValue)
  
  oecd_gdppc_link <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/SNA_TABLE1/AUS+AUT+BEL+CAN+CHL+COL+CRI+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA.B1_GE.HVPVOB/all?startTime=1980&endTime=2023"
  
  oecd_gdppc <- oecd_gdppc_link %>% 
    readSDMX() %>% 
    as_tibble() %>% 
    transmute(country = countrycode(LOCATION, "iso3c", "country.name"),
              year = as.numeric(obsTime),
              gdppc = obsValue)
  
  oecd <- oecd_gdppc %>% 
    left_join(oecd_growth, by = c("country", "year")) %>% 
    left_join(oecd_unemp, by = c("country", "year")) %>% 
    left_join(oecd_inf, by = c("country", "year")) %>% 
    group_by(country) %>% 
    mutate(across(gdppc:inflation,
                  ~ imputeTS::na_interpolation(.x)))
  
  rio::export(oecd, here::here("data-raw", "oecd_data.rda"))
} else {
  oecd <- rio::import(here::here("data-raw", "oecd_data.rda"))
}

```

```{r swiid_data}
if (!file.exists(here("data-raw",
                      "swiid9_6",
                      "swiid9_6_summary.csv"))) {
  download.file("https://dataverse.harvard.edu/api/access/datafile/7878619", "data-raw/swiid9_6.zip")
  unzip(here("data-raw", "swiid9_6.zip"), exdir = here("data-raw"))
  file.remove(here("data-raw", "swiid9_6.zip"))
}

swiid_summary <- read_csv(here("data-raw",
                               "swiid9_6",
                               "swiid9_6_summary.csv"),
                          col_types = "cddddddddd") %>% 
  mutate(country = countrycode::countrycode(country,
                                            "country.name",
                                            "country.name",
                                            warn = FALSE)) %>% 
  select(country, year, gini_disp, gini_disp_se)
```

```{r data_combo}
if (!file.exists(here::here("data", "data_combo.rda"))) {
data_combo <- theta_summary %>% 
    filter(country %in% oecd_countries &
             country!="Turkey") %>% 
    group_by(country) %>% 
    group_modify(~ add_row(.x, year = 0:4, .before = 0)) %>%
    mutate(year = case_when(year == 0 ~ lead(year, 5) - 5,
                            year == 1 ~ lead(year, 4) - 4,
                            year == 2 ~ lead(year, 3) - 3,
                            year == 3 ~ lead(year, 2) - 2,
                            year == 4 ~ lead(year, 1) - 1,
                            TRUE ~ year)) %>% 
  left_join(des_data, by = c("country", "year")) %>%
  left_join(gallagher,
            by = c("country", "year")) %>% 
  mutate(parl = if_else(regime == 0, 1, 0),
         election = as.numeric(!is.na(regime) | !is.na(lsq))) %>% 
  select(-regime) %>% 
  fill(parl, .direction = "downup") %>% 
  left_join(oecd,
            by = c("country", "year")) %>% 
  left_join(swiid_summary,
            by = c("country", "year")) %>% 
  fill(lsq) %>% 
  drop_na(mean:gini_disp_se) %>% 
  mutate(federal = as.numeric(country %in% c("Australia", # decentralized = strong
                                             "Belgium",
                                             "Canada",
                                             "Germany",
                                             "Mexico",
                                             "Switzerland",
                                             "United States")),
         lsq_mean = mean(lsq),
         lsq_diff = lsq - lsq_mean,
         gini_mean = mean(gini_disp),
         gini_mean_se = sqrt(sum(gini_disp_se^2))/
           length(gini_disp),
         gini_diff = (gini_disp - gini_mean),
         gini_diff_se = sqrt(gini_disp_se^2 + gini_mean_se^2)/2,
         gdppc_mean = mean(gdppc/1000),
         gdppc_diff = gdppc/1000 - gdppc_mean,
         growth_mean = mean(growth),
         growth_diff = growth - growth_mean,
         recession = if_else(growth >= 0, 0, 1),
         unemploy_mean = mean(unemployment),
         unemploy_diff = unemployment - unemploy_mean,
         inflation_mean = mean(inflation),
         inflation_diff = inflation - inflation_mean) %>% 
  ungroup()

  rio::export(data_combo, here::here("data", "data_combo.rda"))
} else {
  data_combo <- rio::import(here::here("data", "data_combo.rda"))
}
```

```{r analysis, include=FALSE}
if (!file.exists(here("data", "results.rda"))) {
  m1 <- brm(
    formula = bf(
      mean * 100 | mi(sd * 100) ~ election +
        parl +
        federal +
        lsq_mean + lsq_diff +
        gdppc_mean + gdppc_diff +
        growth_mean + growth_diff +
        unemploy_mean + unemploy_diff +
        me(gini_mean, gini_mean_se) +
        me(gini_diff, gini_diff_se) +
        (1 | country) + (1 | year)
    ),
    data = data_combo,
    backend = "cmdstanr",
    warmup = 1000,
    iter = 2000,
    chains = 4,
    cores = parallel::detectCores(),
    seed = 324
  )
  
  doubled_sd <- m1$data %>% 
    select(-`mean * 100`, -mean, -sd,
           -country, -year, -ends_with("_se")) %>% 
    summarize(across(everything(), by2sd)) %>% 
    pivot_longer(everything()) %>% 
    transmute(`.variable` = case_when(name == "gini_mean" ~ 
                                        "bsp_megini_meangini_mean_se",
                                      name == "gini_diff" ~
                                        "bsp_megini_diffgini_diff_se",
                                      TRUE ~ paste0("b_", name)),
              var_names = c("Election Year",
                            "Parliamentarism",
                            "Federalism",
                            "Disproportionality, Mean",
                            "Disproportionality, Difference",
                            "GDPpc, Mean",
                            "GDPpc, Difference",
                            "GDP Growth, Mean",
                            "GDP Growth, Difference",
                            "Unemployment, Mean",
                            "Unemployment, Difference",
                            "Income Inequality, Mean",
                            "Income Inequality, Difference"),
              sd2 = value)
  
  coef_data0 <- m1 %>% 
    tidybayes::gather_draws(`bs?p?_.*`, regex = TRUE) %>% 
    filter(!`.variable`=="b_Intercept") %>% 
    left_join(doubled_sd, by = join_by(.variable))
  
  cy_summary <- m1$data %>%
    count(country) %>%
    pull(n) %>%
    summary()

  save(m1, doubled_sd, coef_data0, cy_summary, file = here::here("data", "results.rda"))
} else {
  load(here("data", "results.rda"))
}

ess_perc <- {{dcpo_input_raw1 %>% 
    filter(country %in% oecd_countries & r==1) %>%
    count(item) %>%
    arrange(desc(n)) %>%
    slice(1) %>%
    pull(n)} * 100 / nrow(m1$data)} %>% 
  round()
```

The resulting dataset comprises the thirty-seven OECD democracies, each observed in twenty-one (Mexico) to forty (Ireland, Italy, the United Kingdom, and the United States) consecutive years (mean: `r cy_summary %>% nth(4) %>% round(1)` years, median: `r cy_summary %>% nth(3)` years).
Even among these relatively data-rich countries, our measure of macrointerest provides much more data than would otherwise be available: the richest single survey for these cases, the European Social Survey, covers only `r ess_perc`% of these country-years, does not provide annual data, and of course excludes entirely the nine OECD members in the Americas and around the Pacific Rim (see Appendix\nobreakspace{}\@ref(ESScomparison)).

@Shor2007 demonstrates that such pooled time series are best analyzed using a Bayesian multilevel model including varying intercepts for each country and each year. 
The former help account for heteroskedasticity across space due to, e.g., omitted variable bias, while permitting the inclusion of time-invariant predictors such as parliamentarism and federalism.
The latter take into account 'time shocks' that operate on all countries simultaneously [@Shor2007, 171-172].
Further, the 'within-between random effects' specification is employed, meaning each of the time-varying predictors is decomposed into its time-invariant country mean and the difference between each country-year value and this country mean; this specification is superior to fixed effects and other commonly used TSCS specifications for addressing omitted variable bias and endogeneity [@Bell2015].
The time-varying difference variables capture the short-term effects of the predictors, while the time-invariant country-mean variables reflect their---often different---long-run, "historical" effects [@Bell2015, 137].
Moreover, as we employ a Bayesian analysis, it is straightforward to incorporate the measurement uncertainty in the data for both macrointerest and income inequality directly into the model, with the estimated values of these variables treated as random draws from distributions with unknown true means but known standard deviations [@McElreath2016, 425-431; see also @Kurz2023, 15.1.2].
The model was estimated using the `brms` R package [@Burkner2017].

```{r resplot, fig.cap="Predicting Macrointerest in OECD Democracies \\label{model}", fig.height = 5, fig.width = 7.5}

ordered <- doubled_sd %>%
  pull(var_names) %>% 
  rev()

coef_data <- coef_data0 %>% 
  mutate(std_coef = round(.value * sd2, 1),
         term = factor(var_names, levels = ordered)) %>%
  ggdist::median_qi(std_coef, .width = c(.8, .9, .95)) %>%
  mutate(ci = paste0(round(.lower, 1), " to ", round(.upper, 1))) %>%
  left_join(doubled_sd, ., by = join_by(.variable))

coef_data0 %>% 
  mutate(std_coef = .value * sd2,
         term = factor(var_names, levels = ordered)) %>% 
  ggplot(aes(y = term, x = std_coef)) +
  stat_halfeye(.width = c(.8, .9, .95)) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  coord_cartesian(xlim = c(-15, 15)) +
  theme_light() +
  xlab(NULL) +
  ylab(NULL) +
  plot_annotation(caption = "Notes: Dots indicate posterior means; whiskers, from thickest to thinnest, describe 80%,\n90%, and 95% credible intervals; shading depicts the posterior probability density function.")
```


Figure\nobreakspace{}\@ref(fig:resplot) displays the results.^[
Appendix\nobreakspace{}\@ref(resultstable) provides a tabular version.]
Consistent with the argument that campaigns bring attention-grabbing information to the public, macrointerest in election years is found to be `r get_coef("election", type = "std_coef")` points (95% credible interval: `r get_coef("election", type = "ci")` points) higher than in years without elections.
This accords with previous research finding small but well-estimated increases in political interest in election years [see, e.g., @Larsen2022].

The hypothesis that power-sharing institutions yield more public interest in politics is also supported.
Macrointerest is estimated to be `r get_coef("parl")` points higher in countries with parliamentary systems.
The point estimate for the difference in macrointerest between countries with and without federalism is estimated be `r get_coef("federal", type = "std_coef")` points, with `r round(p_direction(m1, parameters = "federal")[[2]]*100, 1)`% of the posterior distribution greater than zero.
And although disproportionality is not estimated to have long-run effects that consistently distinguish countries with more or less proportional electoral results, _changes_ in disproportionality appear to have an immediate negative effect: a two-standard-deviation increase in the Gallagher index yields `r get_coef("lsq_diff", type = "std_coef", abs = TRUE)` points less macrointerest (95% c.i.: `r get_coef("lsq_diff", type = "ci")`).

Regarding the debate on whether macrointerest is invigorated or instead discouraged by bad times, the evidence of our cross-national analysis of the impact of economic conditions falls on the side of the latter.
Supporting modernization theory, increases in per capita GDP have a positive short-term effect on aggregate political interest, with a two-standard-deviation increase associated with `r get_coef("gdppc_diff")` points more macrointerest.
The point estimate for the long-term, historical effect as evidenced by differences in mean levels across countries is found to be `r get_coef("gdppc_mean", type = "std_coef")` points, albeit with only `r round(p_direction(m1, parameters = "gdppc_mean")[[2]]*100, 1)`% of the posterior distribution greater than zero.
As predicted by relative power theory, the long-term effects of income inequality are strongly negative, with a two-standard-deviation difference across countries associated with `r get_coef("gini_mean", type = "std_coef", abs = TRUE)` points less macrointerest (95% c.i.: `r get_coef("gini_mean", type = "ci")` points).
Year-to-year changes in income inequality are found to make little difference---it would seem that, from one perspective, the influence of the wealthy over the political agenda does not change on such a short time scale, and from the other, that the public does not react to worsening conditions in the distribution of income with greater interest in its agents' actions.
The results with regard to growth in the national economy and with regard to unemployment similarly do not provide strong evidence of either negative or positive effects.
Still, taken as a whole, this evidence indicates that at least with regard to economic conditions, it is good times, not bad ones, that yield more macrointerest.


# Conclusions {.unnumbered}

Macrointerest, despite its theoretical importance, has as yet drawn only limited empirical attention.
This oversight largely reflects the paucity of available data to measure this important concept.
The cross-national macrointerest dataset presented here addresses this issue, providing annual time series across more than a hundred countries and allowing more and better tests of the wide range of theories that implicate the public's interest in politics.
For example, while the cross-sectional analysis in @Kittilson2010 [, 997-999] finds that, among the three inclusive institutions it considered, only the disproportionality of electoral results influenced political interest and engagement, the pooled time-series analysis presented here indicates parliamentarism, federalism, and proportionality all yield greater macrointerest as that work theorizes.
And although the single-country study in @Peterson2022 [, 219] concludes that bad times prompt increased macrointerest, this evidence shows the opposite, that at least with regard to economy it is _good_ conditions that lead the public to take interest in politics.
By drawing on information about _both_ differences across countries _and_ change over time, it appears these data on cross-national macrointerest provide a firmer basis for drawing sound conclusions.
The cross-national macrointerest dataset is available on the Harvard Dataverse for use in the further investigation of these and other theories on the causes and consequences of aggregate political interest as well as its relationships with other aspects of political engagement.


\pagebreak

# References {.unnumbered}

::: {#refs-text}
:::

\pagebreak

# (APPENDIX) Supplementary material {-}

# Survey Items Used to Estimate Macrointerest {#surveys}
# The DCPO Model {#dcpo}
# Comparing Coverage of the Macrointerest Data and the ESS {#ESScomparison}
# Macrointerest Scores Over Time {#trajectory}
# Tabular Version of Results Presented in Figure \@ref(fig:resplot) {#resultstable}
# Comparison with @Peterson2022 {#peterson}